Mark scheme: Justified – understanding of the data and their
context
Introduction
Key results
Conclusion References (for reason chosen research question)
Narrative shows critical thinking
Multiple data sources
Recommendations and conclusions are grounded in the data
Limitations of the data set discussed. Next steps suggested in terms of
data that would allow for further analysis.
There are two types of emergency contraceptive pill prescribed by pharmacies.
- Levonogestrel (brand name: Levonelle) taken within three days of unprotected sex
- Ulipristal Acetate (brand name: ellaOne) taken within five days of unprotected sex
This report explores
Prescribing patterns of emergency contraceptive pill (ECP) between 2019 to 2023.
How ECP and antibiotics for STI prescriptions differ in geography comparing university town to a non-university town
Compare patterns in ECP and STI antibiotics prescribing during Covid-19 lockdowns - did the need for ECP or incidence of STI antibiotics prescription go down during lockdowns?
How does SIMD relate to prescription of ECP / antibiotics for STIs
# load necessary libraries
library(tidyverse)
library(janitor)
library(gt)
library(here)
library(lubridate)
library(sf)
library(ggspatial)
library(plotly)
I have chosen to look at all prescriptions from 2019 and 2022. To do this I downloaded the prescriptions data for each month from: https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community
I wanted to explore how many ECP and antibiotics for STIs were prescibed during across 2023.
4 plots and / or tables No use of settings other than default Labelled, titled plots Labelled titled tables Appropriate and thoughtful data visualisation with some use of non-default settings (gt) Faceting, multilayered plots, interactive visualisation
Read in data:
# Read in the Health Board names (HB_names). See code for link
HB_names <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv") %>%
clean_names() # ensures column names are unique, lower case and replaces spaces and special characters with underscores. I will use this function for all read ins to ensure my data is consistent and easy to manipulate.
# Read in population data per Health Board from the 2022 census data. Available from: https://statistics.ukdataservice.ac.uk/dataset/scotland-s-census-2022-uv102a-age-by-sex/resource/b2d295c2-af53-4b3d-a075-7815cadd9060
all_population_data <- read_csv(here("data", "UV103_age_health_board_census.csv"), skip = 10) %>% # locates the csv and excludes the first 10 lines in the csv as they are redundant
rename(Spare = "...6", # remove unused columns
hb_name = "Health Board Area 2019",
hb_population = Count) %>% # rename() formats the data to match the prescriptions dataframe
filter(Sex == "Female") %>% # filter female population, as men do not take ECP
select(hb_name, Age, hb_population) %>% # select columns of interest
mutate(hb_name = paste("NHS", hb_name)) %>% # change hb_name column format to match the Health Board dataframe format
clean_names()
# create dataset with total population for each health board
population_data <- all_population_data %>%
filter(age == "All people") %>%
select(hb_name, hb_population)
# select the population aged 16 to 34 years as this is the population the population most likely engaging with risk taking behaviors such as unprotected sex. I chose 16 as this is the age of consent in Scotland, and 34 as my cut off for young adults.
age_population_data <- all_population_data %>%
filter(age%in% c(16:34)) %>% # filter ages of interest
mutate(age = as.numeric(age)) %>% # make numerical so they can be placed in to buckets
mutate(age_group = case_when(between(age, 16,34)~'16-34')) %>% # make bucket 16-34 years
group_by(hb_name, age_group) %>%
summarise(pop_hb_16_34 = sum(hb_population)) # sum total population aged 16 to 34 per health board region.
Define a function to read in the prescription data for a defined year:
# #For efficiency I created a function to read in my prescriptions datasets.
# #I downloaded 12 months of prescription data for each year from 2019 to 2022. I placed the 12 months of data into their relevant folder named all_months_year, where year was specific to the data it contained.
# read_all_prescriptions <- function(year){
# all_files <- list.files(here("data", paste0("all_months_", year)), pattern = "csv") #list.files() retrieves files from the relevant directory, and the paste0() dynamically constructs the folder name based on the year variable placed into the function.
# all_prescriptions <- all_files %>%
# map_dfr(~read_csv(here("data", paste0("all_months_", year),.))) %>% #map_dfr() row-binds the datasets being red in
# clean_names() %>%
# drop_na(bnf_item_description) # drop the rows with missing bnf_item values
# return(all_prescriptions)
# }
Read in datasets using read_all_prescriptions() function and start tidying:
# all_prescriptions_2019 <- read_all_prescriptions(2019)
# all_prescriptions_2020 <- read_all_prescriptions(2020)
# all_prescriptions_2021 <- read_all_prescriptions(2021)
# all_prescriptions_2022 <- read_all_prescriptions(2022)
#
# # 2019 dataset has hbt2014 as a column name instead of hbt. Rename to make column name consistent
# all_prescriptions_2019 <- all_prescriptions_2019 %>%
# rename(hbt = "hbt2014")
#
# # combine 4 years of data into one dataset to make it easier to wrangle
# combined_prescriptions <- bind_rows(
# all_prescriptions_2019,
# all_prescriptions_2020,
# all_prescriptions_2021,
# all_prescriptions_2022 )%>% # I mutate early and select the prescriptions of interest to prevent a very large dataset from being stored in my environment (prevents R slowing down and crashing)
# mutate(
# date = parse_date_time(paid_date_month, "ym"), # use lubridate to format date
# drug_simple = case_when(
# str_detect(bnf_item_description, "LEVONO") ~ "Levonorgestrel",
# str_detect(bnf_item_description, "ULIPR") ~ "Ulipristal Acetate",
# TRUE ~ "Other")) %>% # used case_when() to group different dosages of the same drug
# filter(drug_simple != "Other",!is.na(date)) %>% #remove prescriptions of no interest and any missing date values
# filter(hbt != "SB0806") %>% # filter out SB0806 as it is not a health board (Scottish Ambulance Service)
# filter(!is.na(hbt))
# # save combined_prescriptions dataset to csv
# # write_csv(combined_prescriptions,"data/combined_prescriptions.csv")
# filter drugs of intereset before completing read in to prevent loading large datasets into the environment (prevents R from running slow and crashing)
combined_prescriptions <- read_csv(here("data","combined_prescriptions.csv")) %>%
mutate(
date = parse_date_time(paid_date_month, "ym"), # use lubridate to format date
drug_simple = case_when(
str_detect(bnf_item_description, "LEVONO") ~ "Levonorgestrel",
str_detect(bnf_item_description, "ULIPR") ~ "Ulipristal Acetate",
TRUE ~ "Other")) %>% # used case_when() to group different dosages of the same drug
filter(drug_simple != "Other",!is.na(date))%>% # remove other prescriptions from the dataset and any missing date values
filter(hbt != "SB0806") %>% # filter out SB0806 as it is not a health board (Scottish Ambulance Service)
filter(!is.na(hbt))
Join and wrangle data:
# join the Prescriptions dataset to the Health Board names dataset and health board population data
ECP_scripts <- combined_prescriptions %>%
full_join(HB_names, by = c("hbt" = "hb")) %>% # Join with Health Board names
full_join(population_data, by = "hb_name") %>% # Join with population data
select(gp_practice, date, drug = drug_simple, hb_name, paid_quantity,hb_population) %>% # select columns of interest
mutate(month = factor(month(date), levels = 1:12, labels = month.abb), .after = date) %>% # extract month as a factor with labels to help when plotting
mutate(year = factor(year(date)), .after = month)
This graph explores seasonal trends in prescribing of the two most commonly prescribed emergency contraceptive pills, Levonogestrel and Ulipristal Acetate, between 2019 to 2022 by Health Board. I plotted this to see if events such as Valentines day, University term start dates, or national holidays effect emergency contraceptive prescribing rates. I included 4 years of consecutive data to see if a one-time event, such as the Common Wealth Games caused any spikes in emergency contraception prescribing. Additionally by plotting the 4 consecutive years we can see if Covid-19 lockdowns in the years 2020 and 2021 affected emergency contraception prescribing rates.
Plot: Population-Weighted Rates: Ensures that national trends are calculated based on the total population across all health boards, avoiding bias from smaller regions. Accurate National-Level Insights: Better reflects overall prescribing behaviour at the national level, aligning with your goal to analyse seasonal and national events. prescribing rates are weighted by the total population, providing an accurate reflection of overall prescribing behaviour across Scotland.
total_population <- sum(population_data$hb_population, na.rm=TRUE) # calculate total population of Scotland
# calculate the number of prescriptions of each drug per month
ECP_by_month <- ECP_scripts %>%
group_by(date, drug, month, year) %>%
summarise(total_quantity_month = sum(paid_quantity, na.rm = TRUE)) %>%
ungroup() %>%
drop_na(drug) %>%
clean_names()
# calculate prescription rate per 100,000 people for each drug and month
ECP_trend_plot_data <- ECP_by_month %>%
group_by(drug, year, month) %>%
summarise(prescriptions_per_100000 = (total_quantity_month / total_population)*100000)
ECP_trend_plot <- ECP_trend_plot_data %>%
ggplot(aes(x = month, y = prescriptions_per_100000, group = year, linetype = year, color = year)) + # plot data for each year on same graph
geom_line(aes(size = ifelse(year == "2020", 0.9, 0.7))) + # make the 2020 line slightly thicker to make it stand out
facet_wrap(~drug, scales = "free_y") + # separate the two types of ECP
scale_linetype_manual(values = c("2019" = "dotdash", "2020" = "solid", "2021" = "dashed", "2022" = "dotted"),name = "Year") + # make each year identifiable by linte type
scale_color_manual(values = c("2019" = "grey70", "2020" = "red", "2021" = "grey70", "2022" = "grey70"),name = "Year") + # make Covid-19 year red (2020 had the longest lockdown, March till May)
scale_size_identity() + # Ensures ggplot uses the size values as outlined in geom_line()
scale_y_continuous(limits = c(0, NA), breaks = scales::pretty_breaks(n = 5))+ # Ensure y-axis starts at 0 and has evenly spaced breaks
labs(
title = "Trends in Emergency Contraception Prescribing 2019–2022 in Scotland.",
subtitle = "Four years of population adjusted prescribing rates highlighting the impact of Covid-19 (red)",
x = "Month",
y = "Prescriptions per 100,000 women") +
theme_minimal() +
theme(
plot.title = element_text(size = 22, hjust = 0.5, face = "bold"),
plot.subtitle = element_text(size = 16, hjust = 0.5),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
axis.text.x = element_text(angle = 45, hjust = 1, size = 9),
axis.text.y = element_text(size = 9),
strip.text = element_text(size = 16),
legend.title = element_text(size = 12),
legend.text = element_text(size = 16),
legend.position = "bottom")
ECP_trend_plot
investigate what happened in 2019 - change in prescription practises over time ? did they used to use it for something else?
Trends to note:
It would be interesting to explore if different Health Boards prescribe different amounts of ECP.
to see if there is a trend between rate of prescribing of ECP and proportion of young people living in the HB
# tidy data to make table with columns and values of interest
ECP_anual_rate_data <- ECP_scripts %>%
group_by(hb_name, drug) %>% # aggregate data by Health Board and drug
summarise(total_quantity_4_years = sum(paid_quantity, na.rm = TRUE), # Total prescriptions over 4 years
avg_annual_total_quantity = total_quantity_4_years / 4, # average annual prescriptions
hb_population = first(hb_population), # hb_population is consistent within each hb_name
.groups = "drop") %>% # ungroup data
drop_na(drug) %>% # remove rows with no drug values
mutate(avg_annual_presc_100000 = avg_annual_total_quantity * 100000 / hb_population) %>% # average annual rate of prescriptions per 100,000
select(hb_name, drug, avg_annual_presc_100000, hb_population) %>% # select columns of interest before pivot
pivot_wider(names_from = drug, values_from = avg_annual_presc_100000, names_glue = "{drug}_rate") %>% # Rename columns for clarity
clean_names() # clean column names following pivot
# calculate total ECP prescription rate per health board
ECP_anual_rate_data <- ECP_anual_rate_data %>%
mutate(total_ECP_rate = rowSums(select(., levonorgestrel_rate, ulipristal_acetate_rate), na.rm = TRUE)) %>% #sum rates
arrange(desc(total_ECP_rate)) # arrange by total rate in descending order
#calculate percentage young people per health board
ECP_anual_rate_data_buckets <- ECP_anual_rate_data %>%
full_join(age_population_data) %>% # to get the population of those aged 16-34 years
group_by(hb_name) %>%
mutate(prop_young_ppl_hb = pop_hb_16_34/hb_population) %>% # calculate proportion of young people in each health board
ungroup()
annual_avg_ECP_table <- ECP_anual_rate_data_buckets %>%
select(hb_name, levonorgestrel_rate, ulipristal_acetate_rate, total_ECP_rate,prop_young_ppl_hb) %>% # Select relevant columns
gt() %>%
cols_label(hb_name = "Health Board",
total_ECP_rate= "Total",
levonorgestrel_rate= " Levonorgestrel",
ulipristal_acetate_rate=" Ulipristal Acetate",
prop_young_ppl_hb = "% Aged 16-34 Years") %>% # rename columns to make reader-friendly
fmt_number(columns = c(levonorgestrel_rate, ulipristal_acetate_rate, total_ECP_rate, prop_young_ppl_hb), decimals = 0) %>% # no decimal points as false accuracy detracts from the message in the data
cols_align(align = "center",
columns = c(levonorgestrel_rate,ulipristal_acetate_rate,total_ECP_rate, prop_young_ppl_hb)) %>% # centre column names
grand_summary_rows(columns = c(levonorgestrel_rate,ulipristal_acetate_rate, total_ECP_rate),
fns = list("Overall Average" = ~mean(., na.rm = TRUE)),
fmt = list(~ fmt_number(., decimals = 0))) %>% # add an overall average of the prescription rate columns
fmt_percent(columns = prop_young_ppl_hb, decimals = 0) %>% # add percentage sign
tab_header(title = md("**Average Annual Rate of Emergency Contraception Prescriptions by Health Board in Scotland.**"),
subtitle = md("Rate per 100,000 women, derived from the mean prescription rates across the years 2019 to 2022. Health Boards are ranked in descending order.")) %>% # add a title and subtitle; md() allows text formatting from mark down
tab_spanner(label = md("*Prescription rate per 100,000 women*"),
columns = c(levonorgestrel_rate,ulipristal_acetate_rate, total_ECP_rate)) %>% # add a title to prescription rate columns.
tab_source_note(md("*Data from Public Health Scotland. Available from: (https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community)*")) %>%
tab_stubhead(md("**2019-2022**")) %>%
tab_footnote(footnote = "includes Capital City, Edinburgh",
locations = cells_body(columns = hb_name, rows = 2))%>%
opt_stylize(style = 6, color = "cyan")
annual_avg_ECP_table
| Average Annual Rate of Emergency Contraception Prescriptions by Health Board in Scotland. | |||||
| Rate per 100,000 women, derived from the mean prescription rates across the years 2019 to 2022. Health Boards are ranked in descending order. | |||||
| 2019-2022 | Health Board |
Prescription rate per 100,000 women
|
% Aged 16-34 Years | ||
|---|---|---|---|---|---|
| Levonorgestrel | Ulipristal Acetate | Total | |||
| NHS Greater Glasgow and Clyde | 3,376 | 29 | 3,404 | 26% | |
| NHS Lothian1 | 2,495 | 56 | 2,551 | 27% | |
| NHS Ayrshire and Arran | 2,027 | 15 | 2,043 | 19% | |
| NHS Tayside | 1,365 | 22 | 1,387 | 22% | |
| NHS Forth Valley | 1,268 | 73 | 1,342 | 22% | |
| NHS Grampian | 1,210 | 16 | 1,226 | 22% | |
| NHS Fife | 1,049 | 54 | 1,103 | 22% | |
| NHS Lanarkshire | 784 | 48 | 832 | 22% | |
| NHS Shetland | 779 | 17 | 796 | 19% | |
| NHS Orkney | 609 | 7 | 616 | 17% | |
| NHS Dumfries and Galloway | 515 | 24 | 539 | 17% | |
| NHS Western Isles | 192 | 197 | 389 | 16% | |
| NHS Highland | 285 | 52 | 337 | 17% | |
| NHS Borders | 242 | 76 | 318 | 17% | |
| Overall Average | — | 1,157 | 49 | 1,206 | — |
| Data from Public Health Scotland. Available from: (https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community) | |||||
| 1 includes Capital City, Edinburgh | |||||
western isles relatively high uli prescription commpared to levo - explore if prescribing practices vary by location?
How does deprivation influence type of contraception prescribed? I wanted to explore if there was any correlation between the type of contraception being prescribed and deprivation. To do this I made a ratio of levonogestrel to total emergency contraception prescriptions.
A/(A+B) Where A = Levonogestrel (prescribed within 3 days of unprotected sex) B = Ulipristal Acetate (prescribed within 5 days of unprotected sex)
This is important to identify any variation in prescribing patterns in more deprived areas. Differences may suggest patients access health services later, so have to use Ulipristal Acetate, or that GPs or Pharmacists have differences in prescribing preferences in more deprived areas.
To measure deprivation I have used the Scottish Index of Multiple Deprivation. I took the SIMD 2020v2 dataset from Public Health Scotland website [available from: https://www.opendata.nhs.scot/gl/dataset/scottish-index-of-multiple-deprivation/resource/acade396-8430-4b34-895a-b3e757fa346e ] as this dataset contains SIMD decile weighted per Health Board population.
When interpreting SIMD in deciles rank 1 is considered the most deprived, and rank 10 is least deprived.
# read in datasets:
SIMD <- read_csv("https://www.opendata.nhs.scot/gl/dataset/78d41fa9-1a62-4f7b-9edb-3e8522a93378/resource/acade396-8430-4b34-895a-b3e757fa346e/download/simd2020v2_22062020.csv") %>%
clean_names() %>%
select(data_zone, simd2020v2hb_decile)
# I chose to use GP Practices and List sizes from October 2022, as this was the closest dataset I could find which correlated with the final year of my prescriptions datset
gp_addresses <- read_csv("https://www.opendata.nhs.scot/dataset/f23655c3-6e23-4103-a511-a80d998adb90/resource/1a15cb34-fcf9-4d3f-ad63-1ba3e675fbe2/download/practice_contactdetails_oct2022-open-data.csv") %>%
clean_names() %>%
select(practice_code, gp_practice_name, data_zone)
# Create ECP_GP by using the GP_addresses dataset to map the GP practice code to a datazone. I then used the column datazone to full_join() the SIMD dataset to the prescriptions dataset.
ECP_GP <- ECP_scripts %>%
filter(!gp_practice %in% c(99996, 99997, 99998)) %>% # remove dummy GP practice codes as they do not have a known gp practice code so cannot be mapped to a SIMD.
left_join(gp_addresses, by = c("gp_practice" = "practice_code")) %>%
#left_join(data_zones, by = "data_zone") %>%
left_join(SIMD, by = "data_zone") %>%
drop_na(simd2020v2hb_decile) %>% # need SIMD value to make barchart
group_by(gp_practice) %>%
mutate(total_quantity_per_gp = sum(paid_quantity)) %>%
clean_names()
# calculate the number of GPs per health board
toal_no_GP_per_hb <- ECP_GP %>%
filter(!gp_practice %in% c(99996, 99997, 99998)) %>% # remove dummy GP practice codes as they do not have a known gp practice code so cannot be mapped to a SIMD.
group_by(hb_name) %>%
summarise(unique_gp_count_per_hb = n_distinct(gp_practice))
# calculate average population per GP in a health board
pop_per_gp_hb <- population_data %>%
full_join(toal_no_GP_per_hb) %>%
mutate(avg_pop_per_GP = hb_population /unique_gp_count_per_hb)
# join pop_per_gp_hb to ECP_GP
ECP_pop_per_gp_hb <- ECP_GP %>%
left_join(pop_per_gp_hb)
# make bar chart
ECP_SIMD_barchart <- ECP_pop_per_gp_hb %>%
group_by(simd2020v2hb_decile, drug) %>%
summarise(
prescriptions_hb_gp = (total_quantity_per_gp *10000 / avg_pop_per_GP), # summarise the prescriptions per 100,000
.groups = "drop" # Ungroup data after summarisation
) %>%
ggplot(aes(x = prescriptions_hb_gp, y = factor(simd2020v2hb_decile, levels = 1:10), fill = drug)) +
geom_col() +
scale_fill_brewer(palette = "Set2", name = "Drug Type") + # add colour palette
labs(
title = "Barchart to show how the prescription rate of \n Emergency Contraceptive Pill varies by Scottish Index of Multiple Deprivation",
subtitle = "This chart shows the average prescription rate of ECP per SIMD, \n accounting for differences in number of GPs in each SIMD.",
x = "Prescriptions per 10,000 women",
y = "SIMD Decile \n (1 = Most Deprived, 10 = Least Deprived)",
fill = "Drug") +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 12, hjust = 0.5),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
axis.text.y = element_text(size = 10),
legend.position = "right",
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 10))
ECP_SIMD_barchart <- ggplotly(ECP_SIMD_barchart)
ECP_SIMD_barchart
Prescribing choices vary per regions
# load the NHS Health board Shapefile downloaded from learn page
NHS_healthboards <- st_read(here("data", "NHS_healthboards_2019.shp")) %>%
mutate(HBName = paste("NHS", HBName)) %>% # format to match ECP_scripts dataset
clean_names()
## Reading layer `NHS_healthboards_2019' from data source
## `C:\Data_science\B273025\data\NHS_healthboards_2019.shp' using driver `ESRI Shapefile'
## Simple feature collection with 14 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 7564.996 ymin: 530635.8 xmax: 468754.8 ymax: 1218625
## Projected CRS: OSGB36 / British National Grid
# calculate the ratio of Lev to Uli prescribed
variation_ECP_prescribed <- ECP_scripts %>%
group_by(hb_name,drug) %>%
# calculate the total of Lev and Uli prescribed per health board
summarise(total_each_drug_type = sum(paid_quantity, na.rm = TRUE)) %>%
drop_na(drug) %>%
#pivot_wider to move drug names to columns
pivot_wider(names_from = drug, values_from = total_each_drug_type) %>%
clean_names() %>% # consistency
mutate(levo_to_uli_ratio = levonorgestrel / (levonorgestrel + ulipristal_acetate)) # calculate ratio
# Join spatial data with variation_ECP_prescribed
variation_ECP_prescribed <- NHS_healthboards %>%
left_join(variation_ECP_prescribed)
#Create map in ggplot
map_variation_ECP_prescribed <- variation_ECP_prescribed %>%
ggplot(aes(fill = levo_to_uli_ratio)) +
geom_sf(size = 0.1, colour = "grey50", alpha =0.9) +
scale_fill_distiller(palette = "Blues", direction = 1) +
labs(title = "Geographical variation in Emergency Contraceptive Pill \n prescribing in Scotland.",
subtitle = "Heatmap showing Levonorgestrel prescriptions as a proportion of total Emergency \n Contraceptive Pill (ECP) prescriptions by Health Board region",
fill = "Levonorgestrel to ECP ratio", caption = "Data Source: Public Health Scotland Prescriptions in the Community, 2019-2022") +
coord_sf() +
theme_void() +
theme(
plot.title = element_text(face = "bold", size = 22, hjust=0.5),
plot.subtitle = element_text(size = 16, hjust=0.5),
legend.title = element_text(size = 12),
legend.text = element_text(size = 9, hjust=0.5),
legend.direction = "vertical",
legend.box = "horizontal") +
annotation_scale(location = "tl") +
annotation_north_arrow(
location = "tl",
pad_y = unit(0.5, "in"),
style = north_arrow_nautical(fill = c("grey40", "white"),line_col = "grey20"))
map_variation_ECP_prescribed
# calculate the ratio of Lev to Uli prescribed
variation_ECP_prescribed_v2 <- ECP_scripts %>%
group_by(gp_practice,drug) %>%
# calculate the total of Lev and Uli prescribed per health board
summarise(total_each_drug_type = sum(paid_quantity, na.rm = TRUE)) %>%
drop_na(drug) %>%
#pivot_wider to move drug names to columns
pivot_wider(names_from = drug, values_from = total_each_drug_type) %>%
clean_names() %>% # consistency
mutate(levo_to_uli_ratio = levonorgestrel / (levonorgestrel + ulipristal_acetate)) %>% # calculate ratio %>%
drop_na(levo_to_uli_ratio) %>%
full_join(ECP_GP)
# make bar chart
ECP_SIMD_barchart_variations <- variation_ECP_prescribed_v2 %>%
#group_by(simd2020v2hb_decile, drug) %>%
ggplot(aes(x = factor(simd2020v2hb_decile, levels = 1:10), y = levo_to_uli_ratio, fill = drug)) +
geom_col() +
scale_fill_brewer(palette = "Set2", name = "Drug Type") + # add colour palette
labs(
title = "Variation in prescribing practise by SIMD",
subtitle = "",
x = "SIMD Decile \n (1 = Most Deprived, 10 = Least Deprived)",
y = "prescriptions",
fill = "Drug") +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 12, hjust = 0.5),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
axis.text.y = element_text(size = 10),
legend.position = "right",
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 10))
ECP_SIMD_barchart_variations
I used https://gsverhoeven.github.io/post/zotero-rmarkdown-csl/ to set up citations. Chosen BMJ style as common.